Search CORE

66 research outputs found

A Bootstrapping architecture for time expression recognition in unlabelled corpora via syntactic-semantic patterns

Author: Poveda Poveda Jordi
Surdeanu Mihai
Turmo Borras Jorge
Publication venue
Publication date: 01/01/2007
Field of study

In this paper we describe a semi-supervised approach to the extraction of time expression mentions in large unlabelled corpora based on bootstrapping. Bootstrapping techniques rely on a relatively small amount of initial human-supplied examples (termed “seeds”) of the type of entity or concept to be learned, in order to capture an initial set of patterns or rules from the unlabelled text that extract the supplied data. In turn, the learned patterns are employed to find new potential examples, and the process is repeated to grow the set of patterns and (optionally) the set of examples. In order to prevent the learned pattern set from producing spurious results, it becomes essential to implement a ranking and selection procedure to filter out “bad” patterns and, depending on the case, new candidate examples. Therefore, the type of patterns employed (knowledge representation) as well as the ranking and selection procedure are paramount to the quality of the results. We present a complete bootstrapping algorithm for recognition of time expressions, with a special emphasis on the type of patterns used (a combination of semantic and morpho- syntantic elements) and the ranking and selection criteria. Bootstrap- ping techniques have been previously employed with limited success for several NLP problems, both of recognition and classification, but their application to time expression recognition is, to the best of our knowledge, novel. As of this writing, the described architecture is in the final stages of implementation, with experimention and evalution being already underway.Postprint (published version

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

UPCommons. Portal del coneixement obert de la UPC

Verb similarity: comparing corpus and psycholinguistic data

Author: Castellón Masalles Irene
Coll-Florit Marta
Gil Vallejo Lara
Turmo Jordi
Publication venue: 'Walter de Gruyter GmbH'
Publication date: 26/01/2017
Field of study

Similarity, which plays a key role in fields like cognitive science, psycholinguistics and natural language processing, is a broad and multifaceted concept. In this work we analyse how two approaches that belong to different perspectives, the corpus view and the psycholinguistic view, articulate similarity between verb senses in Spanish. Specifically, we compare the similarity between verb senses based on their argument structure, which is captured through semantic roles, with their similarity defined by word associations. We address the question of whether verb argument structure, which reflects the expression of the events, and word associations, which are related to the speakers' organization of the mental lexicon, shape similarity between verbs in a congruent manner, a topic which has not been explored previously. While we find significant correlations between verb sense similarities obtained from these two approaches, our findings also highlight some discrepancies between them and the importance of the degree of abstraction of the corpus annotation and psycholinguistic representations.La similitud, que desempeña un papel clave en campos como la ciencia cognitiva, la psicolingüística y el procesamiento del lenguaje natural, es un concepto amplio y multifacético. En este trabajo analizamos cómo dos enfoques que pertenecen a diferentes perspectivas, la visión del corpus y la visión psicolingüística, articulan la semejanza entre los sentidos verbales en español. Específicamente, comparamos la similitud entre los sentidos verbales basados en su estructura argumental, que se capta a través de roles semánticos, con su similitud definida por las asociaciones de palabras. Abordamos la cuestión de si la estructura del argumento verbal, que refleja la expresión de los acontecimientos, y las asociaciones de palabras, que están relacionadas con la organización de los hablantes del léxico mental, forman similitud entre los verbos de una manera congruente, un tema que no ha sido explorado previamente. Mientras que encontramos correlaciones significativas entre las similitudes de los sentidos verbales obtenidas de estos dos enfoques, nuestros hallazgos también resaltan algunas discrepancias entre ellos y la importancia del grado de abstracción de la anotación del corpus y las representaciones psicolingüísticas.La similitud, que exerceix un paper clau en camps com la ciència cognitiva, la psicolingüística i el processament del llenguatge natural, és un concepte ampli i multifacètic. En aquest treball analitzem com dos enfocaments que pertanyen a diferents perspectives, la visió del corpus i la visió psicolingüística, articulen la semblança entre els sentits verbals en espanyol. Específicament, comparem la similitud entre els sentits verbals basats en la seva estructura argumental, que es capta a través de rols semàntics, amb la seva similitud definida per les associacions de paraules. Abordem la qüestió de si l'estructura de l'argument verbal, que reflecteix l'expressió dels esdeveniments, i les associacions de paraules, que estan relacionades amb l'organització dels parlants del lèxic mental, formen similitud entre els verbs d'una manera congruent, un tema que no ha estat explorat prèviament. Mentre que trobem correlacions significatives entre les similituds dels sentits verbals obtingudes d'aquests dos enfocaments, les nostres troballes també ressalten algunes discrepàncies entre ells i la importància del grau d'abstracció de l'anotació del corpus i les representacions psicolingüístiques

The Oberta in open access

Verb similarity: Comparing corpus and psycholinguistic data

Author: Castellón Masalles Irene
Coll-Florit Marta
Gil-Vallejo Lara
Turmo Jordi
Publication venue: 'Walter de Gruyter GmbH'
Publication date: 14/05/2020
Field of study

Diposit Digital de la Universitat de Barcelona

Inductive logic programming and its application to the temporal expression chunking problem

Author: Poveda Poveda Jordi
Turmo Borras Jorge
Publication venue
Publication date: 01/01/2007
Field of study

This document first introduces general notions about ILP (inductive logic programming), including a basic vocabulary of ILP, a typology of ILP systems and a description of the main techniques in ILP. It discusses the application of one particular ILP system, FOIL, to the problem of chunking (segmenting) time expressions occurring in natural language text. We employ a propositional knowledge representation that considers features of the individual tokens plus the tokens in a context window of limited size. We trained three rule-based classifiers with FOIL to learn to recognize time expressions using IOB tags, using annotated data from the ACE 2005 corpus. The evaluation methodology and the results of our experiments are reported in this document.Postprint (published version

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

UPCommons. Portal del coneixement obert de la UPC

Verb similarity: Comparing corpus and psycholinguistic data

Author: Irene Castellón
Jordi Turmo
Lara Gil-Vallejo
Marta Coll-Florit
Publication venue: 'Walter de Gruyter GmbH'
Publication date
Field of study

Crossref

KNOW2: Language understanding technologies for multilingual domain- oriented information acces

Author: Agirre Eneko
Castellón Masalles Irene
Climent Salvador (Climent Roca)
Padró Lluís
Rigau German
Turmo Jordi
Publication venue: Sociedad Española para el Procesamiento del Lenguaje Natural (SEPLN)
Publication date: 27/02/2019
Field of study

The goal of the project is to explore integrated environments allowing the cost-effective deployment of vertical information access portals for specific domains. The project started in January 2010, and will last three years

Diposit Digital de la Universitat de Barcelona

Sistema de recomendación para un uso inclusivo del lenguaje

Author: Carrera Jordi
Fuentes Fort Maria
Padró Cirera Montserrat
Padró Lluís
Turmo Borras Jorge
Publication venue
Publication date: 01/01/2009
Field of study

Sistema que procesa un texto escrito en castellano detectando usos del lenguaje no inclusivos. Para cada sintagma nominal sospechoso el sistema propone una serie de alternativas. El sistema permite también la adquisición automática de ejemplos positivos a partir de documentos que hagan un uso inclusivo del lenguaje. Estos ejemplos serán usados, junto a su contexto, en la presentación de sugerencias. Abstract: System to detect exclusive language in spanish documents. For each noun phrase detected as exclusive, several alternative are suggested by the system. Moreover, the system allows the automatic adquisition of positive examples from inclusive documents to be presented within their context as alternatives.Peer ReviewedPostprint (published version

UPCommons. Portal del coneixement obert de la UPC

Secretaría de Estado de Cultura

Advanced semantic textual processing for the detection of diagnostic codes, procedures, concepts and their relationships in health records

Author: Díaz de Ilarraza Sánchez Arantza
Fresno Fernández Víctor
Gojenola Galletebeitia Koldo
Martínez Unanue Raquel
Padró Cirera Lluís
Turmo Borrás Jordi
Publication venue: Sociedad Española para el Procesamiento del Lenguaje Natural
Publication date: 01/01/2017
Field of study

El objetivo de este proyecto es desarrollar procesadores para el análisis automático de textos médicos, poniendo a disposición de la comunidad científica y empresarial un conjunto amplio y versátil de herramientas y recursos lingüísticos para el análisis morfológico, sintáctico y semántico, así como la asignación de códigos diagnósticos y procedimientos a informes médicos según el estándar CIE-10 y la detección de relaciones entre conceptos. Se desarrollaran herramientas para el español, dado su amplio uso en sistemas de salud a nivel internacional, explorando además otras lenguas con diferentes características como el catalán y el vasco.The main aim of this project will be to develop a set of processors for the automatic analysis of medical texts. The project will create a wide and exibleset of tools, linguistic, and semantic resources for the following tasks: morphologic, syntactic and semantic analysis adapted to medical texts; assignment of diagnostics and procedures following the ICD-10 coding, and detection of relationships between concepts. The project will develop tools for Spanish, used in multiple health systems of different countries. Moreover, we will also tackle other languages with different characteristics such as Catalan and Basque.Esta contribución ha sido subvencionada por el MINECO (TIN2016-77820-C3-1-R, TIN2016-77820-C3-2-R, TIN2016-77820-C3-3-R y AEI/FEDER, UE.

Repositorio Institucional de la Universidad de Alicante

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

UPCommons. Portal del coneixement obert de la UPC

Consens per a la prevenció de la mort sobtada cardíaca en els esportistes

Revistes Catalanes amb Accés Obert

Consenso para prevenir la muerte súbita cardíaca de los deportistas

Revistes Catalanes amb Accés Obert